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Abstract-Today, money laundering poses a serious threat not only to financial institutio so to the 

nations. This criminal activity is becoming more and more sophisticated and seems to fcive\oved from the 
cliche of drug trafficking to financing terrorism and surely not forgetting personal ga^l^st international 
financial institutions have been implementing anti-money laundering solutions toAflj^ investment fraud. 
On the other hand, cloud-based applications are merging daily and bringing ici^its with lower cost of 
platforms and data storage, greater scalability and improved business contiruAy. Hence, more financial 
institutions aim to move their IT infrastructure to the cloud. However^wessing directly to the customer 
transaction datasets by a third party could be a confidential issue. Th^aoy^ch is more severe when these 
solutions are built by collaborating partners. Traditional methods ah^ifsed on data access agreement but 
there is still a risk of infringing privacy. In order to preserve the QfiAc/bf datasets, different data disguising 
methods have been proposed. Nevertheless, analyzing disguis^^latasets is a performance issue in the 
context of detecting suspicious money laundering cases wlW^the real value of data has an important 
impact. Indeed, the results of analysis could also be a issue. Within the scope of a collaboration 

project for developing a new cloud-based solution for tftw^nti- Money Laundering Units in an international 
investment bank, in this paper, we propose newLo^Cwl- based approach using data disguising methods 
applied in analysing transaction datasets. We ata^row that the creating relevant dimensions from the 
current ones are efficient for analysing transat^on laatasets in terms of both detecting suspicious case and 
privacy preserving. 

Cv 

Introduction 

Money laundering (M L) is a pro«es to make illegitimate income appear legitimate; this is also the process 
by which criminals attempy^wirceal the true origin and ownership of the proceeds of their criminal 
activity. Through M L, cri&A^ftry to convert monetary proceeds derived from illicit activities into "clean" 
funds using a legal mediikji^ch as large investment or pension funds hosted in retail or investment banks 
[1]. This type of crimrf^activity is getting more and more sophisticated. Nations care about M L because 
they care about^h/r\ontical and economic stability. Therefore, anti-money laundering (AML) is of critical 
significance to^^^Onal financial stability and international security. Traditional approaches to AML 
followed a Irfl^uVintensive manual approach because ML is a sophisticated activity with many way of 
launderiqritp^ney. Recently, there are AML approaches based on data mining techniques (DM) [2] that 
have b^ayroposed and discussed in literature. Most of these approaches try to recognize M L patterns by 
dtffeWfc-rechniques such as support vector machine [6], correlation analysis [3], histogram analysis [3][5], 
clHrtepmg [8], etc. However, there has been growing concern that the use of DM technology may violate 
individual privacy when they access and analyse real datasets [12] . This problem becomes more and more 
severe as solutions provided by third-party companies, even though these datasets are protected by data 
access agreement, there is a risk of infringing privacy, such as customer information [4], transactions. 
Besides, privacy preserving data mining (PPDM) aims at developing models and techniques about 
aggregated data without direct access to all detailed information of individual transactions. However, there 
is still little research of applying these methods on real datasets such as customer transaction from a bank. 
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Today, cloud-based applications and new capabilities are emerging daily and briging them lower cost of 
entry, pay-for-use processor and data-storage models, greater scalability, improved performance, ease of 
redundancy and improved of business continuity. Hence, more and more financial institutes select cloud 
computing as a solution of their IT platforms and services. However, privacy preserving in accessing to 
customer data is a big issue with these institutes especially with the banks it would affect their reputation 
because accessing directly to the customer transaction datasets by a third party is a confidential issue. This 
approach is more severe when these solutions are built by collaborating partners. Traditional methods are 
based on data access agreement but there is still a risk of infringing privacy. In order to preserve the privacy 
of datasets, we should look at different data disguising methods. Besides, the results of data ^najr^s, 
especially in the case of money laundering are also confidential data that are also subject to prese^^he 
privacy. ^ 

In this paper, we present a framework for cloud-based solution to detect the suspicious ca»ei^M L, it can 
also preserve the privacy for confidential data. We also show that in our approach wfere in^w dimensions 
created appropriately from current ones can be efficiently used to analyse transactiri^Hatets. The rest of 
this paper is organised as follows: Section II deals with background of our research, l^ton III presents our 
approach of a cloud-based framework for detecting M L and methods for disguj^Qdata applied to detect 
suspicious cases of ML activities. We evaluate our methods with real custl^F transaction datasets in 
Section IV. We also analyse and discuss on results of our approach in ttu^ctron. Finally, we conclude in 



Section V. 



techmqu 



II. Background 

A. Data MiningTechniques for AML and PrWa^yTreserving Data mining 

An approach for analysing data in AM L is using supporL^wr machine (SVM) [6]. In [7], authors proposed 
an extension of SVM to detect unusual customer brfN^ow. They present a combination of an improved 
RBF kernel [8] with the definition of distinct distejto^)] and supervised/unsupervised SVM algorithms (C- 
SVM, one-class SVM). Even though DM techjjiqueyshow their efficiency in detecting suspicious cases of 
M L, they could lead to the potential misuse/ 



On the other hand, the PPDM mQdal^rJd techniques attempt to aggregate data without accessing to 
original information of individual «®V Some of the most used techniques include: randomization [B], 
kanonymity [14], cryptography,\ad transformation [15][]6]. However, these methods have a performance 
issue with outlier [B], distanceCr^srvation or it is difficult to analyse the behaviour of customers by using 
transformed values. Thea^@5foaches also has a performance issue when the analysing is carried out 
outside the financial instif 




. B. SaaS for Analysing Transaction Datasets 

^h^mar 



Recently, thei^a^many SaaS solutions for analysing transaction datasets []7][B][19]. Most of them focus 
on analysishjNarteaction datasets for application such as sale prediction, etc. However, to the best of our 
knowle^d^Jjiere is not any SaaS developed for AM L. 

III. Cloud-Based Solution for Detecting ML 

A. Challenges of a SaaS for Analysing Confidential Data 

In the first scenario, confidential data is locally stored inside the financial institutions. So, they can use a 
SaaS AM L solution to analyse their data instead of buying AM L software. In fact, using SaaS in this case 
could lead to security issues as SaaS solutions communicate periodically with servers to exchange 
data/ information. This communication can be performed over a security channel such as SSL/TLS to 
guarantee the privacy. However, the financial institutions have also to run their own data center. 
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Figure 1 SaaS for analysing confidential data 
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In the second scenario, confidential data can be stored in the cloud by using c 
service. Although the cloud-based data storage has many advantages such as 
access, it has also issues that some users will never feel comfortable with their 
performance, security and data orphans. It is more severe in the case of 
organisations. Users may abandon data in cloud storage facilities, leavi 
the sharing storage devices across multiple customers can lead to the^fti 



2fee data storage 
reliability, ease of 
the cloud such as its 
data of financial 
nf|!!rential data at risk. Besides, 
king. For example, when a file 




d tne^ra» 

system deletes a file on disk, it simply marks the locations within v^^l^fee file resides as available for use 
to store other files. If other customers come along and allocate fc^e on the disk for storage, they can 
examine the allocated space and may have access to previou^jneted confidential data. We can have 
moreover different kinds of attacking such as: Distributed D^nia^cr- Service (DDoS), Packet Sniffing, Man- 
in-the-M iddle, etc., [20]. A popular solution is to encrypJ^to»data for both data storage in the cloud and 
data communication over the cloud. Data is only decr^Secrat end-point user side (Fig. la). In this case, if 
financial institutions use a SaaS AML to analyse th&Nfcloud- based storage data, this SaaS can only read 
encrypted data. H owever, in order to analyse dat^festS solution has to decrypt it inside the cloud and this 
task also raises an issue of security. In this^aserSaaS should have the capacity of analysing directly 
encrypted data and therefore techniques ofm^fcy preserving data mining should be considered. 

In fact, there is one more solution for3&l^fM L approach where it can download data from the cloud to the 



end-user platform and perform recfstjrad analysis. However, this solution does not scale well with large 
financial datasetsand it has to dKlwith the same issues as the first scenario above. 



e^jNd 
oSliwiff 

B. CloiXj4j^|ed Framework for Detecting Money Laundering 

We present in thisseo^Joxir cloud-based framework for building AM L solutions. As shown in Figure 2, in 
our framework^/?l*prlTie that users store their data in cloud-based storage. However, this data is not 
encrypted by m^A^ivate keys paradigm but by privacy preserving techniques that will be described in 
Section 4. W|«hXcall these techniques as disguising data techniques. We have two scenarios here. For the 
first scenarfcMcrta is moved from the local data centre to cloud and the second scenario presume that data 
alread>«&!!« in the cloud. In the first scenario, data is disguised before being sent to the cloud (Fig. lb (1), 
(a^^gjlpdating of data consists of updating for both transactional data and disguised data. In the second 
dfe, altool is developed to integrate in cloud-based storage application to disguise confidential data. Our 
franwork supports to both two scenarios. We describe it with more details in following paragraphs. 



There are two main components in our framework: data analysis (Fig. lb (II)) and data conversion (Fig. lb (I) 
& (III)). Our data conversion module can deal with two different scenarios as follows: 

Scenario 1 Moving data centre to cloud-based storage. In this case, users want to move their data 
centre into the cloud. Normally, they use services from the cloud providers to perform this process. They 
can apply at the same time our disguised data application to distort their local data (Fig. lb (])) and store it 
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in the cloud (Fig. lb (8)). This disguised data application is developed as a plug-in (Fig.lb (III)) that can be 
integrated into not only our SaaS AM L solution (SAM L) but also into migrating solutions offered by cloud 
providers. 

Scenario 2: Disguising cloud-based data. In this case, again, our disguised data application can be 
plugged-in to not only our SAML but also other data management tool of cloud providers to convert 
customers' data to distort data (Fig. lb (I)). 



In both scenarios, to distort data, our data conversion component uses methods discussed in th 
section. The data analysis component (Fig.]b (II)) consists of statistical and data mining methods to 
the disguised data to detect suspicious cases of M L. 4 




C. Approaches for Disguising Data 



0' 



In this sub-section, we describe our approach for disguising data, which is impjtfpfiilted in our data 
conversion component. This approach consists of creating new dimensions and tti^miiin purpose of this 
method is to disguise datasets in preserving the distance among them in orderGteceive the accuracy of 
both statistics and clustering results. Generally, AML expert normally consic0Sbe following dimensions: 
frequency of subscriptions, frequency of redemption, subscription ^aUj£, 4 Teaemption value, current 
balance. All these features are conditional on time: daily, weekly, rnqftt^^c. However, analysing directly 
of these dimensions would raise a concern of privacy. Based on^^^tperience from AML experts, we 
created new dimensions based on current significant ones above. I^\ct, we defined six new dimensions: 
Al A2, A3, A4, A5 and A6. Alis the proportion between the re^iption value and the subscription value 
conditional on time (daily, weekly, monthly, etc.) and A2, t*e proportion between a specific redemption 
value and the total value of the investors' shares conditiorafewti me. Note that the value of the transactions 
(subscription or redemption) of each investor in an inv«^ent fund is aggregated by time (daily, weekly...). 
The definition of A3 is based on the proportion betw^Vie frequency of subscription and its average value. 
If the value of A3 is close to 1 then the frequo^Ntf subscription is significantly high comparing to its 



average value. This is also a remarkable sign 
same as (3) and A4 is for the frequency 
redemption respectively. Briefly, the or/ 
disguised to six parameters. 



,6 




spicious behaviour. The definition of A4, A5, A6 is the 
mption; A5, A6 is for the amount of subscription and 
atasets with subscription and redemption values with be 



V. Experiments 



Our cloud-based framew^kHjraJ been implemented. In this framework, the most important component is 
the data conversion, becfagj^t should guarantee both the accuracy of analysing data carried out by data 
analysis process and IflJSpnvacy preserving of the data. Hence, in this paper, we evaluate first of all the 
performance thed/RkMversion component. We use transactions from 2 of 15 funds administered by BEP 
bank with arouj^A^million transaction records of about 3 thousands customers in the last ten years. The 
original dataJ^OStorted in six new dimensions. We evaluate first of all the performance of our approach i.e. 
the capacitoAf analysing data based on new dimensions. As mentioned in the previous section, two most 
import/«^5«rameters are Aland A2. Hence, we start to evaluate these two parameters first. 

)ba&rving Figure 2, we can first notice that these new dimensions can hide sensitive information i.e. real 
valCr^f investment (values of subscription/redemption/balance). Moreover, it is important to note that the 
(Al A2) reflects well on customer behaviours. A double check with AM L Unit also confirms our conclusion. 
Meanwhile, there are cases with high value (Al A2) in the fund SK. Consequently, they are suspicious cases 
in thisfund. 
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Figure 2. Suspicious Factor of SK fund 

This analysis can be carried out outside the financial institute and dusterin|%£fTi:s are sent back where 
and further analyses are needed to find out the origin: they are real susoidftus^ses or this is only trends in 
investment activities such as exchange transactions [8]. Other paramfif^yAj, A4, A5 and A6 can be used as 
additional parameters that can first of all support the AM L expert ij^n^teion of suspicious cases of M L by 
combining them with the first two parameters. Besides, these pdb^ierers can also be used to hide the 
results of analysing. As mentioned in previous sections, financia^amites normally aim to distort not only 
their datasets but also hide the results of analysing as they deyno^want information such as there are some 
suspicious cases of money laundering detected by SaaSsojj^Sii- Therefore, the meaning of each dimension 
from Al to A6 is defined at user level, not at analysin-fc^weT So, at the analysing level, all dimensions are 
applied with the same techniques and send results btffliV) users. Users can link results with the meaning of 
each dimension to interpreter results by usirjq^sV Data interpreter component (Fig. lb (IV)). As a 
conclusion, the creating of new dimensions i^fficrent in the context of detecting suspicious cases of M L 
and it can be moreover performed outsid«heVfinancial institute. This requires a strong knowledge on 
business requirements to decide which*^Sjension will be created. However, it can be carried out by 
internal experts of financial institutea^ll^'e sending disguised datasets to outside partners. Hence, our 
approach Integrates knowledge frorm^L experts to create efficient and relevant dimensions. 
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VI. Conclusion 



In the context of detectffigWispicious case of ML in an investment bank, in this paper, we present a 
framework for doud-^M^Tsolution that can preserve the privacy for confidential data. This framework 
allows us to distofn^Ki by preserving the distance between elements that is the important impact to 
clustering analy^^&le experts from external institute can then analyse on these encrypted datasets and 
then send tb^^fHaback to the financial institute. Hence, cloud-based tool can use this approach for 
developingsqwrionsfor AM L. Our framework was developed as a SaaS. More experiments are being carried 
out wiUsijsb, world datasets. In the next step, more disguising techniques will be analysed in the same 
cQjtrt^k^F/peri mental results for more datasets are also being produced and these will allow us to test and 
ei^luajethe robustness of our approach. 
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